This dataset comes from Winchester et al.’s (2014) article, “Dental Topography of Platyrrhines and Prosimians: Convergence and Contrasts.” The main question of the article asks if dietary preference can be figured out based on tooth measurements and, if so, are platyrrhines showing enough topographical variation to apply these models to extinct taxa? After running a mix of ANOVA, PGLS, and DFA, main results seem to conclude that platyrrhines are broadly similar to the prosimians. However, using dietary categories to assign groups can still be accurately assigned 82% of the time with a prosimian-only sample and 94% of the time with a platyrrhine only sample, with a combined percentage of 77% when testing “unknown” individuals. For clarity, here is a list of the main topographic measurements done in this study on platyrrhine samples:
Variables measured: Five variables on M2
For my replication, I chose to focus on testing each variable for normality, using boxplots to illustrate relationships between each variable when separated by prosimians and platyrrhines (Figure 2), the variability between each topographic measurement using ANOVA and linear models (Table 3), and using a linear discriminant analysis to test each topographic variable against diet based on group (Figure 3 and Figure 4).
First, we will read in the dataset and check for normality.
> library(ggplot2)
> library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
> library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
> # Load in specimen name/variable dataset and name it "SpecData"
> SpecData <- read.csv("specvariable_repdata.csv")
>
> # Quick look at the dataset
> head(SpecData)
## Specimen Group Species M2.Length RFI
## 1 USNM 171063 Platyrrhini Alouatta palliata aequatorialis 7.34 0.569
## 2 USNM 284782 Platyrrhini Alouatta palliata aequatorialis 8.03 0.594
## 3 USNM 290601 Platyrrhini Alouatta palliata aequatorialis 7.83 0.547
## 4 SB N Al 13 Platyrrhini Alouatta seniculus 7.91 0.572
## 5 USNM 123517 Platyrrhini Alouatta seniculus 7.75 0.536
## 6 USNM 281658 Platyrrhini Alouatta seniculus 7.73 0.557
## DNE OPCR Shearing.ratio Shearing.quotient Diet
## 1 167.027 52.250 3.050 -0.174 Folivory
## 2 207.057 53.625 3.362 2.719 Folivory
## 3 178.497 53.250 3.002 -0.676 Folivory
## 4 170.393 51.625 3.092 -0.830 Folivory
## 5 172.125 54.000 2.986 -1.486 Folivory
## 6 170.079 54.625 3.030 -0.929 Folivory
> #Quick summary of dataset values
> summary(SpecData)
## Specimen Group Species
## AMNH 100503: 2 Platyrrhini:111 Brachyteles arachnoides: 10
## AMNH 100632: 2 Prosimii :121 Aotus nigriceps : 8
## AMNH 100640: 2 Alouatta seniculus : 7
## AMNH 100829: 2 Avahi laniger : 7
## AMNH 100832: 2 Microcebus griseorufus : 7
## AMNH 170461: 2 Saimiri boliviensis : 7
## (Other) :220 (Other) :186
## M2.Length RFI DNE OPCR
## Min. :1.500 Min. :0.3250 Min. : 70.44 Min. :30.12
## 1st Qu.:3.350 1st Qu.:0.4878 1st Qu.:133.43 1st Qu.:47.09
## Median :4.100 Median :0.5135 Median :162.25 Median :52.88
## Mean :4.548 Mean :0.5122 Mean :171.63 Mean :54.75
## 3rd Qu.:5.702 3rd Qu.:0.5533 3rd Qu.:203.84 3rd Qu.:60.09
## Max. :8.720 Max. :0.6730 Max. :354.60 Max. :94.88
##
## Shearing.ratio Shearing.quotient Diet
## Min. :2.248 Min. :-8.0660 Folivory :50
## 1st Qu.:2.659 1st Qu.:-1.8540 Frugivory :62
## Median :2.971 Median : 0.6645 Hard-Object Feeding:47
## Mean :2.994 Mean : 2.5183 Insectivory :25
## 3rd Qu.:3.301 3rd Qu.: 5.4460 Omnivory :48
## Max. :4.167 Max. :25.9420
## NA's :6 NA's :6
Now I will use qqplots to check for normality in each of the topographic variables.
> # For M2 Length
> qqnorm(SpecData$M2.Length, main="M2.Length Normality Check")
> qqline(SpecData$M2.Length, col="black")
> # For OPCR
> qqnorm(SpecData$OPCR, main="OPCR Normality Check")
> qqline(SpecData$OPCR, col="black")
> qqnorm(SpecData$Shearing.ratio, main="Shearing Ratio Normality Check")
> qqline(SpecData$Shearing.ratio, col="black")
> qqnorm(SpecData$Shearing.quotient, main="Shearing Quotient Normality Check")
> qqline(SpecData$Shearing.quotient, col="black")
> qqnorm(SpecData$DNE, main="DNE Normality Check")
> qqline(SpecData$DNE, col="black")
> RFI <- qqnorm(SpecData$RFI, main="RFI Normality Check")
> qqline(SpecData$RFI, col="black")
As can be seen, the plots seem to skew off the best fit line at each end. This is likely due to some extremes in each topographic variable that skews out the expected distribution.
Now that we have the data loaded in, we want to recreate the first set of boxplots (Figure 2) based on dental topography measurements for 5 types of feeding behaviors measured in the Winchester et al. (2014) paper.
> # Recreating Figure 2: Boxplots of each Dental topography to the type of diet
> library(forcats)
> OPCR <- SpecData %>%
+ mutate(Diet = fct_relevel(Diet,
+ "Insectivory", "Folivory", "Omnivory",
+ "Frugivory", "Hard-Object Feeding")) %>%
+ ggplot( aes(x=Diet, y=OPCR)) +
+ geom_boxplot(aes(fill=Group)) +
+ xlab("Diet")
>
> SR <- SpecData %>%
+ mutate(Diet = fct_relevel(Diet,
+ "Insectivory", "Folivory", "Omnivory",
+ "Frugivory", "Hard-Object Feeding")) %>%
+ ggplot( aes(x=Diet, y=Shearing.ratio)) +
+ geom_boxplot(aes(fill=Group)) +
+ xlab("Diet")
>
> SQ <- SpecData %>%
+ mutate(Diet = fct_relevel(Diet,
+ "Insectivory", "Folivory", "Omnivory",
+ "Frugivory", "Hard-Object Feeding")) %>%
+ ggplot( aes(x=Diet, y=Shearing.quotient)) +
+ geom_boxplot(aes(fill=Group)) +
+ xlab("Diet")
>
> DNE <- SpecData %>%
+ mutate(Diet = fct_relevel(Diet,
+ "Insectivory", "Folivory", "Omnivory",
+ "Frugivory", "Hard-Object Feeding")) %>%
+ ggplot( aes(x=Diet, y=DNE)) +
+ geom_boxplot(aes(fill=Group)) +
+ xlab("Diet")
>
> RFI <- SpecData %>%
+ mutate(Diet = fct_relevel(Diet,
+ "Insectivory", "Folivory", "Omnivory",
+ "Frugivory", "Hard-Object Feeding")) %>%
+ ggplot( aes(x=Diet, y=RFI)) +
+ geom_boxplot(aes(fill=Group)) +
+ xlab("Diet")
>
> #Call each to check on the data, note that 6 rows are removed in the shearing.ratio and shearing.quotient due to lack of data for Daubentonidae
> OPCR
> SR
> SQ
> DNE
> RFI
Here is Figure 2 from the Winchester Paper. My replicated versions match the original versions, however, mine lack some outlier points opposed to the original boxplots.
We will next include two-way ANOVAs for the platyrrhine sample by dietary group factor. But first, we should look at quick plots to see how our data pans out.
> #Checking the Class of Group and Diet
> class(SpecData$Group)
## [1] "factor"
> class(SpecData$Diet)
## [1] "factor"
> # Making a quick plot to look at the differences in dietary type
> plot (SpecData$Diet ~ SpecData$Group)
Now that we can visualize the data, let’s make our first linear model to test if there are significant differences between each based on Group, Diet, and Group x Diet. This shows the variation in each topographic factor within each Group that is explained by Diet. The analysis here replicates Table 3 from the Winchester paper. I will also create a linear model equation for each in order to verify the p-values stated in the Winchester et al. paper.
> # Running 2-Way ANOVA for DNE
> DNEaov <- aov(data = SpecData, DNE~ Group * Diet)
> summary(DNEaov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 145009 145009 230.1 < 2e-16 ***
## Diet 4 348229 87057 138.2 < 2e-16 ***
## Group:Diet 3 28923 9641 15.3 4.36e-09 ***
## Residuals 223 140516 630
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # Quick Plot of DNE ANOVA
> par(mfrow = c(2, 2))
> plot(DNEaov)
> # Running linear model for DNE to retrieve p-value
> DNElm <- lm(data = SpecData, DNE~ Group * Diet)
> summary(DNElm)
##
## Call:
## lm(formula = DNE ~ Group * Diet, data = SpecData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.671 -12.837 -1.249 14.914 83.322
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 174.618 5.613 31.110 < 2e-16
## GroupProsimii 38.932 7.246 5.373 1.95e-07
## DietFrugivory -25.770 6.874 -3.749 0.000227
## DietHard-Object Feeding -55.961 6.846 -8.174 2.24e-14
## DietInsectivory 57.732 6.798 8.493 2.86e-15
## DietOmnivory 9.657 9.722 0.993 0.321644
## GroupProsimii:DietFrugivory -48.803 9.844 -4.958 1.41e-06
## GroupProsimii:DietHard-Object Feeding -81.699 13.149 -6.213 2.52e-09
## GroupProsimii:DietInsectivory NA NA NA NA
## GroupProsimii:DietOmnivory -39.952 11.494 -3.476 0.000612
##
## (Intercept) ***
## GroupProsimii ***
## DietFrugivory ***
## DietHard-Object Feeding ***
## DietInsectivory ***
## DietOmnivory
## GroupProsimii:DietFrugivory ***
## GroupProsimii:DietHard-Object Feeding ***
## GroupProsimii:DietInsectivory
## GroupProsimii:DietOmnivory ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.1 on 223 degrees of freedom
## Multiple R-squared: 0.788, Adjusted R-squared: 0.7804
## F-statistic: 103.6 on 8 and 223 DF, p-value: < 2.2e-16
Now that we have one done, we will do the same for RFI, OPCR, SR, and SQ. You will notice that the F values vary, but the p-values stay consistent to the table.
> # Running 2-Way ANOVA for RFI
> RFIaov <- aov(data = SpecData, RFI~ Group * Diet)
> summary(RFIaov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 0.0516 0.05161 56.75 1.23e-12 ***
## Diet 4 0.5316 0.13289 146.11 < 2e-16 ***
## Group:Diet 3 0.0471 0.01571 17.27 4.03e-10 ***
## Residuals 223 0.2028 0.00091
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # Quick Plot of DNE ANOVA
> par(mfrow = c(2, 2))
> plot(RFIaov)
> # Running linear model for RFI to retrieve p-value
> RFIlm <- lm(data = SpecData, RFI~ Group * Diet)
> summary(RFIlm)
##
## Call:
## lm(formula = RFI ~ Group * Diet, data = SpecData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.086227 -0.016993 -0.000298 0.016050 0.091880
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) 0.580100 0.006744 86.023
## GroupProsimii -0.050267 0.008706 -5.774
## DietFrugivory -0.051150 0.008259 -6.193
## DietHard-Object Feeding -0.083734 0.008226 -10.180
## DietInsectivory 0.051287 0.008167 6.280
## DietOmnivory -0.032800 0.011680 -2.808
## GroupProsimii:DietFrugivory -0.067456 0.011827 -5.704
## GroupProsimii:DietHard-Object Feeding -0.082266 0.015798 -5.208
## GroupProsimii:DietInsectivory NA NA NA
## GroupProsimii:DietOmnivory -0.007665 0.013809 -0.555
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## GroupProsimii 2.58e-08 ***
## DietFrugivory 2.81e-09 ***
## DietHard-Object Feeding < 2e-16 ***
## DietInsectivory 1.75e-09 ***
## DietOmnivory 0.00542 **
## GroupProsimii:DietFrugivory 3.70e-08 ***
## GroupProsimii:DietHard-Object Feeding 4.34e-07 ***
## GroupProsimii:DietInsectivory NA
## GroupProsimii:DietOmnivory 0.57940
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03016 on 223 degrees of freedom
## Multiple R-squared: 0.7566, Adjusted R-squared: 0.7478
## F-statistic: 86.62 on 8 and 223 DF, p-value: < 2.2e-16
> # Running 2-Way ANOVA for OPCR
> OPCRaov <- aov(data = SpecData, OPCR~ Group * Diet)
> summary(OPCRaov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 13848 13848 175.24 < 2e-16 ***
## Diet 4 3730 933 11.80 1.03e-08 ***
## Group:Diet 3 2499 833 10.54 1.64e-06 ***
## Residuals 223 17622 79
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # Quick Plot of DNE ANOVA
> par(mfrow = c(2, 2))
> plot(OPCRaov)
> # Running linear model for DNE to retrieve p-value
> OPCRlm <- lm(data = SpecData, OPCR~ Group * Diet)
> summary(OPCRlm)
##
## Call:
## lm(formula = OPCR ~ Group * Diet, data = SpecData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.668 -5.475 -0.207 5.177 34.369
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.375 1.988 26.852 < 2e-16
## GroupProsimii -2.869 2.566 -1.118 0.26474
## DietFrugivory 8.900 2.434 3.656 0.00032
## DietHard-Object Feeding 16.293 2.425 6.720 1.50e-10
## DietInsectivory 2.365 2.407 0.982 0.32692
## DietOmnivory 2.400 3.443 0.697 0.48647
## GroupProsimii:DietFrugivory -17.363 3.486 -4.981 1.27e-06
## GroupProsimii:DietHard-Object Feeding -20.049 4.657 -4.305 2.50e-05
## GroupProsimii:DietInsectivory NA NA NA NA
## GroupProsimii:DietOmnivory -8.511 4.070 -2.091 0.03766
##
## (Intercept) ***
## GroupProsimii
## DietFrugivory ***
## DietHard-Object Feeding ***
## DietInsectivory
## DietOmnivory
## GroupProsimii:DietFrugivory ***
## GroupProsimii:DietHard-Object Feeding ***
## GroupProsimii:DietInsectivory
## GroupProsimii:DietOmnivory *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.89 on 223 degrees of freedom
## Multiple R-squared: 0.5326, Adjusted R-squared: 0.5158
## F-statistic: 31.76 on 8 and 223 DF, p-value: < 2.2e-16
> # Running 2-Way ANOVA for SR
> SRaov <- aov(data = SpecData, Shearing.ratio~ Group * Diet)
> summary(SRaov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 13.194 13.194 217.058 < 2e-16 ***
## Diet 4 17.744 4.436 72.980 < 2e-16 ***
## Group:Diet 2 0.572 0.286 4.705 0.00999 **
## Residuals 218 13.251 0.061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
> # Quick Plot of SR ANOVA
> par(mfrow = c(2, 2))
> plot(SRaov)
> # Running linear model for SR to retrieve p-value
> SRlm <- lm(data = SpecData, Shearing.ratio~ Group * Diet)
> summary(SRlm)
##
## Call:
## lm(formula = Shearing.ratio ~ Group * Diet, data = SpecData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58768 -0.14935 -0.03402 0.12855 1.41893
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.20250 0.05513 58.091 < 2e-16
## GroupProsimii 0.32123 0.07117 4.513 1.04e-05
## DietFrugivory -0.38500 0.06752 -5.702 3.82e-08
## DietHard-Object Feeding -0.74543 0.06724 -11.085 < 2e-16
## DietInsectivory 0.04071 0.06677 0.610 0.54270
## DietOmnivory -0.44940 0.09549 -4.706 4.48e-06
## GroupProsimii:DietFrugivory -0.28605 0.09669 -2.959 0.00343
## GroupProsimii:DietHard-Object Feeding NA NA NA NA
## GroupProsimii:DietInsectivory NA NA NA NA
## GroupProsimii:DietOmnivory -0.07402 0.11289 -0.656 0.51272
##
## (Intercept) ***
## GroupProsimii ***
## DietFrugivory ***
## DietHard-Object Feeding ***
## DietInsectivory
## DietOmnivory ***
## GroupProsimii:DietFrugivory **
## GroupProsimii:DietHard-Object Feeding
## GroupProsimii:DietInsectivory
## GroupProsimii:DietOmnivory
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2465 on 218 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.704, Adjusted R-squared: 0.6945
## F-statistic: 74.06 on 7 and 218 DF, p-value: < 2.2e-16
> # Running 2-Way ANOVA for SR
> SQaov <- aov(data = SpecData, Shearing.quotient~ Group * Diet)
> summary(SQaov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 3458 3458 176.858 < 2e-16 ***
## Diet 4 3772 943 48.234 < 2e-16 ***
## Group:Diet 2 279 140 7.147 0.000985 ***
## Residuals 218 4262 20
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
> # Quick Plot of SR ANOVA
> par(mfrow = c(2, 2))
> plot(SQaov)
> # Running linear model for SR to retrieve p-value
> SQlm <- lm(data = SpecData, Shearing.quotient~ Group * Diet)
> summary(SQlm)
##
## Call:
## lm(formula = Shearing.quotient ~ Group * Diet, data = SpecData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.2992 -2.1801 -0.3344 1.9713 20.9385
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01920 0.98871 -0.019 0.984524
## GroupProsimii 6.56610 1.27642 5.144 5.98e-07
## DietFrugivory 0.03303 1.21091 0.027 0.978267
## DietHard-Object Feeding -4.08331 1.20598 -3.386 0.000841
## DietInsectivory 9.59126 1.19738 8.010 6.83e-14
## DietOmnivory 0.58320 1.71249 0.341 0.733765
## GroupProsimii:DietFrugivory -6.55465 1.73398 -3.780 0.000202
## GroupProsimii:DietHard-Object Feeding NA NA NA NA
## GroupProsimii:DietInsectivory NA NA NA NA
## GroupProsimii:DietOmnivory -3.67963 2.02455 -1.818 0.070513
##
## (Intercept)
## GroupProsimii ***
## DietFrugivory
## DietHard-Object Feeding ***
## DietInsectivory ***
## DietOmnivory
## GroupProsimii:DietFrugivory ***
## GroupProsimii:DietHard-Object Feeding
## GroupProsimii:DietInsectivory
## GroupProsimii:DietOmnivory .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.422 on 218 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.6263
## F-statistic: 54.87 on 7 and 218 DF, p-value: < 2.2e-16
Now that we have compared all of the df, F, and p-values to Table 3. We can see that there is a trend toward a larger F value than the ones recorded in Table 3. There is generally a trend of Group and Diet having larger values than the (Group * Diet) F-values.
Now we will try to run the first discriminant function analysis of first just prosimians.
> library(dplyr)
> library(MASS)
>
> ## Building a Model for Group based on Diet
> model <- lda(Diet~DNE+RFI+OPCR+M2.Length, data = SpecData %>% filter(Group=='Prosimii'))
> model
## Call:
## lda(Diet ~ DNE + RFI + OPCR + M2.Length, data = SpecData %>%
## filter(Group == "Prosimii"))
##
## Prior probabilities of groups:
## Folivory Frugivory Hard-Object Feeding
## 0.24793388 0.18181818 0.04958678
## Insectivory Omnivory
## 0.20661157 0.31404959
##
## Group means:
## DNE RFI OPCR M2.Length
## Folivory 213.55003 0.5298333 50.50583 5.486667
## Frugivory 138.97691 0.4112273 42.04318 4.818182
## Hard-Object Feeding 75.89017 0.3638333 46.75000 4.416667
## Insectivory 271.28220 0.5811200 52.87100 2.836000
## Omnivory 183.25521 0.4893684 44.39474 3.868421
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3 LD4
## DNE -0.01589503 -0.01071809 0.02255529 -0.02297857
## RFI -18.23048808 6.78089272 -12.03709733 22.84098987
## OPCR -0.02329067 -0.06258873 -0.09283985 -0.07535709
## M2.Length 0.15158064 -0.74713694 0.16843943 0.01343694
##
## Proportion of trace:
## LD1 LD2 LD3 LD4
## 0.8834 0.0888 0.0150 0.0129
> predictions <- model %>% predict(SpecData)
> names(predictions)
## [1] "class" "posterior" "x"
> # Predicted classes
> head(predictions$class, 6)
## [1] Folivory Folivory Folivory Folivory Folivory Folivory
## Levels: Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
> # Predicted probabilities of class memebership.
> head(predictions$posterior, 6)
## Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
## 1 0.9159127 3.057234e-04 2.061726e-07 0.0012453890 0.082536026
## 2 0.9888923 6.152336e-06 2.414418e-10 0.0043960416 0.006705521
## 3 0.9661582 6.706555e-04 2.750846e-07 0.0003241260 0.032846774
## 4 0.9537723 2.192882e-04 9.716749e-08 0.0006941445 0.045314188
## 5 0.9554777 1.739290e-03 1.288420e-06 0.0001830356 0.042598715
## 6 0.9609591 4.103401e-04 2.802489e-07 0.0004459847 0.038184319
> # Linear discriminants
> head(predictions$x, 3)
## LD1 LD2 LD3 LD4
## 1 -0.4885521 -1.822708 -1.4349762 1.951717
## 2 -1.5080264 -2.683815 -0.8444470 1.508565
## 3 -0.2188135 -2.523510 -0.9217554 1.116878
> # Make graph
> lda.data <- cbind(SpecData %>% filter(Group=='Prosimii'), predict(model)$x)
> ggplot(lda.data, aes(-LD1,-LD2), inherit.aes = FALSE) +
+ geom_point(aes(color = Diet))
Next, we will do one for only Platyrrhines. It is important to note that the original paper does not include a platyrrhine-only DFA.
> library(dplyr)
> library(MASS)
>
> ## Building a Model for Group based on Diet
> model2 <- lda(Diet~DNE+RFI+OPCR+M2.Length, data = SpecData %>% filter(Group=='Platyrrhini'))
> model2
## Call:
## lda(Diet ~ DNE + RFI + OPCR + M2.Length, data = SpecData %>%
## filter(Group == "Platyrrhini"))
##
## Prior probabilities of groups:
## Folivory Frugivory Hard-Object Feeding
## 0.18018018 0.36036036 0.36936937
## Omnivory
## 0.09009009
##
## Group means:
## DNE RFI OPCR M2.Length
## Folivory 174.6179 0.5801000 53.37500 7.838500
## Frugivory 148.8478 0.5289500 62.27500 4.598450
## Hard-Object Feeding 118.6573 0.4963659 69.66768 4.186902
## Omnivory 184.2747 0.5473000 55.77500 2.780000
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3
## DNE 0.04122506 0.03510623 0.035299153
## RFI 26.96641606 2.99763767 -39.128427049
## OPCR -0.04206473 -0.05342556 -0.009063623
## M2.Length 0.77659863 -0.93603879 0.315724242
##
## Proportion of trace:
## LD1 LD2 LD3
## 0.8195 0.1805 0.0000
> predictions <- model2 %>% predict(SpecData)
> names(predictions)
## [1] "class" "posterior" "x"
> # Predicted classes
> head(predictions$class, 6)
## [1] Folivory Folivory Folivory Folivory Folivory Folivory
## Levels: Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
> # Predicted probabilities of class memebership.
> head(predictions$posterior, 6)
## Folivory Frugivory Hard-Object Feeding Omnivory
## 1 0.9999020 9.796597e-05 2.921329e-11 3.261446e-08
## 2 1.0000000 9.851119e-11 5.420283e-21 4.176638e-12
## 3 0.9999768 2.315618e-05 4.729675e-12 5.247475e-09
## 4 0.9999986 1.421014e-06 9.854092e-14 2.388784e-10
## 5 0.9994613 5.386199e-04 8.626135e-10 3.417559e-08
## 6 0.9999497 5.030303e-05 2.436862e-11 4.159582e-09
> # Linear discriminants
> head(predictions$x, 3)
## LD1 LD2 LD3
## 1 4.363082 -0.8726762 0.02280032
## 2 7.165496 -0.1117598 0.66300197
## 3 4.581141 -1.0480404 1.43414825
> # Make graph
> lda.data2 <- cbind(SpecData %>% filter(Group=='Platyrrhini'), predict(model2)$x)
> ggplot(lda.data2, aes(-LD1,-LD2), inherit.aes = FALSE) +
+ geom_point(aes(color = Diet))
Finally, we will replicate Figure 4 from the paper to show a discriminant function analysis of topographic metrics and molar area for platyrrhines and prosimians.
> library(dplyr)
> library(MASS)
>
> ## Building a Model for Group based on Diet
> model3 <- lda(Diet~DNE+RFI+OPCR+M2.Length, data = SpecData)
> model3
## Call:
## lda(Diet ~ DNE + RFI + OPCR + M2.Length, data = SpecData)
##
## Prior probabilities of groups:
## Folivory Frugivory Hard-Object Feeding
## 0.2155172 0.2672414 0.2025862
## Insectivory Omnivory
## 0.1077586 0.2068966
##
## Group means:
## DNE RFI OPCR M2.Length
## Folivory 197.9772 0.5499400 51.65350 6.427400
## Frugivory 145.3453 0.4871774 55.09597 4.676419
## Hard-Object Feeding 113.1977 0.4794468 66.74202 4.216234
## Insectivory 271.2822 0.5811200 52.87100 2.836000
## Omnivory 183.4676 0.5014375 46.76562 3.641667
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3 LD4
## DNE -0.03249541 -0.003744819 -0.008181301 -0.02150385
## RFI -7.37543201 1.574661968 -1.017144473 23.69184454
## OPCR 0.05103574 -0.017905603 -0.082974649 -0.04326357
## M2.Length -0.07044613 -0.804654184 -0.018973693 -0.23370875
##
## Proportion of trace:
## LD1 LD2 LD3 LD4
## 0.8019 0.1596 0.0380 0.0005
> predictions <- model3 %>% predict(SpecData)
> names(predictions)
## [1] "class" "posterior" "x"
> # Predicted classes
> head(predictions$class, 6)
## [1] Folivory Folivory Folivory Folivory Folivory Folivory
## Levels: Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
> # Predicted probabilities of class memebership.
> head(predictions$posterior, 6)
## Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
## 1 0.9216942 0.0575717561 2.940098e-04 2.092461e-05 0.020419119
## 2 0.9982200 0.0005750234 2.556513e-07 1.487040e-04 0.001056058
## 3 0.9751393 0.0194941452 6.293818e-05 1.107185e-05 0.005292510
## 4 0.9741677 0.0192914560 6.011584e-05 1.058124e-05 0.006470191
## 5 0.9448672 0.0472935560 2.706440e-04 5.854642e-06 0.007562723
## 6 0.9501963 0.0416342125 2.415409e-04 7.620188e-06 0.007920284
> # Linear discriminants
> head(predictions$x, 3)
## LD1 LD2 LD3 LD4
## 1 -0.5934099 -2.095032 0.13440156 0.90018043
## 2 -2.0570207 -2.785402 -0.34570652 0.41093108
## 3 -0.7873557 -2.584814 -0.02933254 -0.02547014
> # Make graph
> lda.data3 <- cbind(SpecData, predict(model3)$x)
> ggplot(lda.data3, aes(-LD1,-LD2), inherit.aes = FALSE) +
+ geom_point(aes(color = Diet, shape = Group))
Success! Now we have successfully replicated Figure 4. You’ll notice the main difference is that color is used in this graph to discern Diet whereas shapes are used to discern Group. I do not think this is more effective than the shape drawing but I do think that the addition of color helps quickly differentiate between Group and Diet.
After running these analyses, I have come to the overall conclusion that Winchester et al. (2014) have allowed for highly replicable research. Their datasets are complete and they explain many of their shortcomings throughout. I did notice that when using qqnorm() for each topograpic variable, there seemes to be a lot of non-normal skewing on the ends. This seems to be due to extreme outliers in the dataset.
When replicating Figure 2, I found that the graphs were almost exactly the same and found it fairly simple to replicate them. Using ANOVA and linear models to test the information in Table 3 was also a good test on earlier modules from the course. However, I noticed that the F statistics differed greatly in comparison to the original Table 3 values. I was unable to find a specific reason for this, but believe that it may have something to do with the different statistics programming software used by Winchester et al.
The most rewarding and also most difficult to replicate were Figures 3 and 4. Once Figure 3 was created, I had no problem creating a platyrrhine-only figure. This figure was not included in the original paper even though I think it would be a useful visualization of the paper’s sample. The creation of Figure 4 gave the most robust visualization of the dataset.
Overall, this was a positive experience that allowed me to dig into a dataset and test replicability.